External drivers of BOLD signal’s non-stationarity

A fundamental challenge in neuroscience is to uncover the principles governing how the brain interacts with the external environment. However, assumptions about external stimuli fundamentally constrain current computational models. We show in silico that unknown external stimulation can produce error in the estimated linear time-invariant dynamical system. To address these limitations, we propose an approach to retrieve the external (unknown) input parameters and demonstrate that the estimated system parameters during external input quiescence uncover spatiotemporal profiles of external inputs over external stimulation periods more accurately. Finally, we unveil the expected (and unexpected) sensory and task-related extra-cortical input profiles using functional magnetic resonance imaging data acquired from 96 subjects (Human Connectome Project) during the resting-state and task scans. This dynamical systems model of the brain offers information on the structure and dimensionality of the BOLD signal’s external drivers and shines a light on the likely external sources contributing to the BOLD signal’s non-stationarity. Our findings show the role of exogenous inputs in the BOLD dynamics and highlight the importance of accounting for external inputs to unravel the brain’s time-varying functional dynamics.


SI1. Joint-estimation algorithm.
In what follows, we seek to determine the dynamics and input matrices, as well as the input sequence of the following dynamical system where x[k] ∈ R n denotes the state, u[k] ∈ R p denotes the input, A ∈ R n×n the dynamics matrix, and B ∈ R n×p the input matrix. Specifically, by considering a sequence of data {z[k]} N −1 k=0 , we seek to find an approximation of the parameters (A, B, {u[k]} N −1 k=0 ) denoted by (Ã,B, {ũ[k]} N −1 k=0 ), respectively. Next, notice that given the dynamical system (1), we obtain where the error can be captured by ε k ∼ N (0, Σ). Subsequently, in a leastsquares minimization sense, we seek to solve the following minimization problem Additionally, since both the input matrix and sequence of inputs are unknown, the problem is not well posed, which force us to consider possible feasibility constraints (e.g., ũ[k] ≤ 1), and the objective should be changed to account for a regularization term (e.g., the 1-norm) that penalize the number of non-zero entries, which lead to the following problem: where λ ∈ R + 0 is the regularization term that weights the tradeoffs between the approximation error and the number of nonzero entries. To determine the unknown parameters (Ã,B, {ũ[k]} N −1 k=0 ) that are the solution to the above problem, we will consider an algorithm that borrows ideas from the Expectation-Maximization algorithm 1 , which details can be found in 2;3 .
Briefly, we seek to determine a converging sequence of unknown parameters (B (l) , {ũ (l) [k]} N −1 k=0 ), which initial parameters (i.e., with l = 0) are set as follows: (i )Ã is the solution to (3) with input matrix and inputs set to zero; and (ii)B (0) is initialized at random which entries are obtained from standard Gaussians. Next, for a predefined value of λ in (4), we find a solution to it by proceeding sequentially as follows (for each iteration l = 1, 2, . . .): It is important to notice that several criteria can be adopted to stop the sequential procedure in the algorithm. In particular, we have considered the total variation between iterations on the inputs to be below a specified threshold. Lastly, to control the inputs' spatial and temporal sparsity independently for the analysis in S13 Fig and S14 Fig, we used two separate λ values, namely, λ U and λ B in steps (5), and (6).

SI2. LTI system's output simulation
We modeled the activity of four hypothetical observed regions as a fourdimensional LTI system and the influence of the unobserved regions and external stimuli into each node as an unknown driver. We used MATLAB's place.m function to design a synthetic system matrix with eigenmodes oscillating at 0.01 and 0.06 Hz to mimic the frequencies of the BOLD signal's neurophysiological component. We generated 100 different sets of 30-min simulated time series with the HCP dataset's fMRI sampling rate (1.4 Hz) from the same synthetic system matrix. We created external input patterns to mimic the conventional event-related and block-design paradigms (see manuscript Fig 1B). Specifically, we created periods of external stimulation with different durations and frequencies as follows: At period II (9-12 min) of the manuscript Fig 1B, the blue node is stimulated in 25 TRs = 18 second blocks, interleaved with similarly sized rest periods, and at period III (15-18 min) the blue node is stimulated for 7 TRs = 5.04 seconds, with inter-stimulus intervals of 3 TRs = 2.16 seconds. We generate different random normalized time series as the recording noise for each node. We select the external input to internal noise variance ratio = 1/16 and the signal to recording noise = 1000. We select low noise conditions in the manuscript Fig 1 to aid the visualization of the results. Therefore, we explore the effect of different internal and recording noise levels on our ability to extract the external inputs accurately more in-depth in S1 Fig. Specifically, instead of the aforementioned input patterns, we used two different sample input patterns from the HCP task protocols: the binarized social task regressor in S5A Fig and the visual cue condition of the motor task in S5E Fig. These results show that the measured outputs' magnitude and the statistical effect size are dependent on the input patterns. For instance, the t-values are almost twice as large in panel A than B under similar noise levels. We can also calculate these metrics from the HCP task data. In theory, the comparison between observed and modeled stimulusinduced BOLD changes should offer insight into the extent of internal and recording noise levels. In the S1C Fig, we provide the maximum (across all ROIs and 0-12 TR lags) the normalized magnitude of the task-related changes in the BOLD and their corresponding t-values for six HCP task scans. These results show that the modeled outcomes from the two sample tasks do not align well. We believe this discrepancy can be partly explained by the unaccounted non-linearity (e.g., hemodynamic filtering) and the differences between the empirical and synthetic systems' spectral profile, size, and input foci. Nevertheless, this framework provides a guideline for what to expect in low or high noise conditions. For instance, these results show that relatively higher levels of external input and high signal-to-noise conditions result in more considerable divergence of the estimated system matrices from the true underlying system, and consequently, a higher estimation error in inputs.
However, as discussed earlier, the exact contributions of internal and external noise and external inputs to the BOLD signal are unknown. Therefore, we only sampled a few noise levels to understand how different noise levels will affect the accuracy of the estimated inputs. Nevertheless, these results show that the signal-to-noise levels close to the possible empirical values can result in different outcomes depending on the pattern of inputs, the utilized system matrix, and the internal noise levels. More importantly, these results provide evidence that the system matrices estimated during periods without stimulation are only beneficial in relatively higher input magnitude and signal-to-noise levels. Together, these observations combined with the main manuscript results demonstrating the utility of system matrices estimated during resting-state in the accurate identification of external inputs, suggest the relatively high levels of external input's magnitude during task scans.

SI3. The dimensionality of external inputs.
One of the most critical parameters of the model is the input matrix B dimensions. This matrix gives us information on the inputs' spatial patterns. However, the dimensions of these external inputs are not known a priori, and naturally, our ability to accurately capture the external inputs depends on this critical parameter. Specifically, input matrices B with dimensions lower than those of the external inputs will not extract all input patterns correctly, whereas higher dimensional input matrices are likely to capture some of the internal and recording noise as external inputs.
Therefore, since we do not know the dimensionality of the external inputs in BOLD, we approximated the dimensions of the input matrix B by performing principal component analysis on the residuals of the models. As seen in S19 Fig, principal components 1-25 capture more than 80% of variance in the average residuals and more than average 60% of subject-level residuals' variance across all tasks. In addition, we compared the goodness-of-fit of the LTI model with and without external inputs using Akaike information criterion (AIC) 4 . Our results demonstrate that incorporating external inputs does not lead to overfitting and improves the model's fit -an effect most pronounced in higher dimensional input matrices (S20 Fig). Finally, we demonstrate that we identify the external inputs during the motor task similarly at high-dimensional input matrices (S18 Fig), as indicated by the high correlation (>0.8) of inputs estimated using input matrix dimensions higher than 25 (S18I Fig).
Together, these results show that higher-dimensional input matrices are required to capture the spatiotemporal profiles of the inputs during the motor paradigm. We expect these results as the motor task conditions activate different specialized brain regions along the somatomotor cortices. Therefore, we select p = 25 for input matrix B for the main manuscript results, much higher than the anticipated task conditions yet small enough to minimize the unwanted noise captured by high dimensional inputs. Nevertheless, these additional analyses also demonstrate the robustness of the captured spatiotemporal input profiles to the choice of high dimensional B matrices.
SI4. Determining the optimal number of eigenvector clusters.
To find the optimal number of eigenvector clusters, we used the following methods.

Elbow method
K-means clustering identifies clusters such that it minimizes the withincluster variance (or the within-cluster sum of squared errors (SSE)). Therefore, ideally, we want this error to be as small as possible. However, increasing the number of clusters always results in a smaller sum of squared errors. The Elbow method is based on plotting SSE as a function of number clusters and identifying a k such that adding another cluster does not result in notable change or reduction in SSE. Generally, the 'elbow' (bend) in the SSE curve indicates the optimal number of clusters. However, it is worth noting that the elbow is not always readily identifiable, especially if the data is not very clustered.

Calinski-Harabasz criterion
We used the criterion proposed by 5 , also known as the variance ratio criterion defined as: where k is the number of clusters, N is the number of observations, and SS B and SS W are the total between-cluster and within-cluster variance, respectively. SS B and SS W are defined as: where k is the number of clusters, x represents a data point, c i represents the i th cluster, n i is the number of points in the i th cluster with the centroid m i , m is the total mean of the data, x − m i and x − m i the Euclidean distance (L 2 norm) between the two vectors. Since well-defined clusters display a high between cluster variance and low within-cluster variance, we find the optimal number of clusters by identifying a k that maximizes the variance ratio criterion.

Davies-Bouldin criterion
The Davies-Bouldin criterion 6 captures the ratio of within-and betweencluster distances and is defined as where D i,j is the ratio of within-to-between cluster distance for clusters i and j, which is defined as whered i andd j are the average distances between each data point in the i th and j th clusters to their own cluster centroids. d i,j is the distance (Euclidean) between the centroids of clusters i and j. Therefore, the optimal k minimizes the Davies-Bouldin index and represents the best within-to-between cluster distance ratio.

Silhouette criterion
Silhouette criterion 7 is a measure of similarity of each point to other points in its cluster, compared to data points in other clusters and is defined as: where a i is the average distance between the i th data point to the other points in its own cluster, and b i is the minimum average distance between i th data point to data points in different clusters (minimized across all clusters). Therefore, a high Silhouette score (ranging between 1 and -1) indicates that the data point is well grouped within its own cluster and poorly matches the data points from other clusters. Conversely, many data points with zero or negative Silhouette values indicate the presence of few or many clusters in the data. Prior research has demonstrated inhomogeneity in the spectral profile of the BOLD signal across the brain 8;9;10;11 . Hence, we hypothesize that there is inhomogeneity in the frequency and damping of eigenmodes. To capture the spatial patterns of activity, we use our method to estimate the internal system parameters from the resting-state time series (1200 TR ≈ 14.5 min). To identify the eigenmodes with similar spatial patterns across subjects, we aggregate all subjects' eigenvectors and perform k-means clustering analysis.
We SI5. The sparsity of system matrix and external inputs.
The high dimensionality of the BOLD signal poses a problem from the model fitting perspective as the highly parameterized models can overfit. Therefore, a good fit in such models may not indicate an accurate identification of the underlying parameters, but instead, the overabundance of parameters can lead to misidentified parameters and small residuals. Thus, it is beneficial to test the sensitivity of the estimated system and inputs' parameters and the model's goodness-of-fit. Therefore, in addition to the leastsquares method detailed in SI4, we estimated sparse system matrices from the resting-state time series using LASSO regularization. We estimated the sparse system matrices using 0.1, 0.01, and 0.001 regularization parameters that result in roughly 95%, 50%, and 5% sparsity. Commonly to account for overfitting, a cost function is defined comprising the error and the number of parameters to penalize over-parameterization. We used Akaike information criterion (AIC) 4 to test the effect of system matrix sparsity (i.e., number of non-zero parameters) on the model's goodness-of-fit. Following 13 , we used AIC = m ln(RSS/m) + 2k to calculate AIC values (m number of observations, k number of parameters, and RSS residual sum of squares). S12 Fig shows the AIC values for A matrices with different sparsity levels and different input matrix B dimensions. These results suggest that full system matrices in the main manuscript are not likely overfitting the data, as indicated by notably higher AIC values of sparse matrices.
Next, we examined the effect of input sparsity on the input estimation accuracy using a multiple linear regression model and the known task regressors. In the main text, we tuned both input U and input matrix B with a single regularization parameter, which resulted in 6.76 ± 0.8% and 5.91 ± 0.4% average zero elements in B and U , respectively. Here, in order to control the sparsity of U and B independently, we introduced two separate spatial and temporal regularization factors, namely λ B and λ U , respectively. Sweeping both regularization parameters allows us to identify subject-specific λ B and λ U values that result in 20%, 50%, and 80% sparsity in both U and B matrices (S13A- B Fig). Overall our results suggest that increasing the sparsity of inputs beyond 20 % results in reduced accuracy of the estimated inputs (S13C Fig).
We also performed PCA on the estimated inputs concatenated across all subjects and identified the subject-level input with the highest loading for PCs 1-10. Finally, we repeated this analysis for different input sparsity and input matrix B dimensions. S14A Fig shows the mean accuracy of the estimated inputs. Since the same input structures can manifest in different PCs depending on the input sparsity and input matrix B dimensions, we performed k-means (k = 10) clustering on the spatial profiles of the inputs (S14B Fig). We identified these spatial patterns from the t−test values of brain regions with significant (t−test, p < 0.05, FDR corrected for multiple comparisons across ROIs) loadings on input matrix B rows corresponding to the aforementioned PCs. S14C Fig shows the centroid of the identified clusters 2-7. We did not present cluster 1 since all the values were zeros. Together these results suggest that although overall high input sparsity can reduce the estimation accuracy, the input sparsity and dimensions can be further tuned to capture different input patterns optimally.

SI6. Effect of estimation window's length on input's accuracy.
We leveraged the full-length (≈ 14.5 min) resting-state time series to estimate the system parameters due to the high dimensions of the system and the slow oscillations in the BOLD. Here we examined the effect of window size on the accuracy of the estimated inputs. We provide the mean accuracy of the subject-and group-level (mean) estimated inputs for all brain regions across several lags for a sample task (Motor). These results show that the subject-level system matrices estimated using both short (3-minute) and full-length resting-state time series increase the estimated inputs' accuracy (measured via R 2 of the multiple linear regression fit) compared to system matrices estimated during the task scan (S8 Fig). Although we find similar results are the group-level average means, interestingly, the system matrix estimated using short window results in overall higher input accuracy. For instance, the R 2 values are higher in the early visual ROIs for full-length window estimates, but the vast majority of the inputs other ROIs have higher R 2 values in the system estimated using the short window.
In addition, PCA analysis on the estimated inputs similar to the manuscript Fig 6 reveals that analogous components emerge in both short and full-length window estimates. As seen in S15 Fig, PC 1-4 are identical on panels C and D. Nevertheless, statistical compassion between the inputs corresponding to each PC reveals that the estimated inputs' time series are significantly (Wilcoxon rank-sum test, p < 0.05) more accurate (measured via R 2 ) for the aforementioned PCs in short window estimates. We speculate that the resting-state scan's unaccounted inputs likely contribute to the higher error in the full-length estimation window. In addition, the inhomogeneous distributions of eigenmodes' spatiotemporal profiles can also play a role, as longer estimation windows can lead to inaccurate estimation of higher frequency and transient activity, especially in the presence of high-frequency noise. Together, these results show that resting-state estimated system matrices offer a suitable approximation of the underlying system and that our reported findings in the main manuscript are robust to estimation window length.

SI7. Principal component analysis of LTI model's residuals.
In addition to our proposed method for estimating external inputs based on spatial and temporal regularization of the LTI model's residuals, we also decomposed residuals using the principal component analysis to investigate the robustness of the estimated inputs' profiles to the choice of factorization method. We concatenated residuals' PCs 1-25 across all subjects and performed a group-level principal component analysis. We provide the temporal profiles of the group-level residuals' PCs 1-25 in S16A show that as mentioned earlier, these estimated input PCs capture a rapid sequence of inputs to different brain regions following the task block's offset. S8A Fig (bottom panel) shows that these rapid changes following the task block's offset are not present in the group-level residual PCs. These results suggest that despite the overall high similarity between characterized spatiotemporal profiles of external inputs across both methods, the spatiotemporal regularization approach is more likely to identify the spatially sparse and transient inputs. d.

Pre-Stimulation Stimulation
Pre-Stim. minus Stim. The color-coded surface shows the average magnitude and variance of the input-induced changes in the system outputs. Specifically, the colors indicate the maximum (across all regions) average magnitude of the input-related changes in the normalized outputs, and the z-axis shows the t-values calculated from comparing the simulated outputs between periods with and without inputs. The x-axis shows the internal noise to external input ratio, and the y-axis shows the total drivers (i.e., external input plus internal noise) to the recording noise ratio. We used the same synthetic four-dimensional system in manuscript all eigenvectors from all subjects were normalized and clustered into 2 clusters and (B) shows 4 clusters using the k-means clustering algorithm with k =4. We colorcoded the clusters identified across all subjects and sessions (99 subjects × 4 sessions × 100 eigenmodes, i.e., 39600 eigenvalues). The brain overlays represent a spatial distribution of the eigenvector associated with an eigenvalue at the centroid of each cluster (displayed with the same color code). To aid the visualization, the centroids were normalized after subtracting from each centroid its minimum element. The color-bar represents the normalized values of centroids for all clusters.  To evaluate the optimal number of clusters (k) present in the eigenvectors we used (A) Calinski-Harabasz criterion (i.e., the ratio of within to between cluster dispersion) 5 , (B) Davies-Bouldin criterion (i.e., the ratio of within-cluster and between-cluster distances) 6 , and (C) Silhouette criterion (i.e., the measure of how similar each data point is to other points in its cluster, compared to points in other clusters) 7 . Based on all three criteria, k = 2 is optimal number clusters in the data -see SI4 for more details regarding these criteria. (D) The plot represents the sum of squared errors (i.e., distances to cluster centers), and the inset plot the percent variance explained by the different number of clusters. Note that due to the smoothness of the curve, we can not identify a single k that corresponds to the 'elbow' of the curve in panel D. Together, these results indicate that the data may not contain clearly defined clusters at k > 2. Therefore to examine the organization of eigenvectors at higher resolutions, in the main manuscript, we choose k = 4 that roughly corresponds to the curve's elbow.   Figure 5: Sample session's task regressors (A-F) A sample scanning session's boxcar regressor for different task conditions (social, working memory, gambling, language, visual cue condition of motor, and relational tasks, respectively). Different task cues and/or blocks are marked by '1' and '2' on y-axis, whereas '0' represents inter-stimuli rest intervals. Since the spectral profile of the task regressors are identical across subjects, we used regressors in (A-F) for our analysis.   Figure 6: Average estimated inputs and input matrix B dimension. Internal system parameters (i.e., A matrices) during full-length resting-state and motor scans were used to estimate the external inputs (B × U ) in panels A-H. We used input matrix B with 1, 2, 5, 10, 25, 50, 75, and 100 in panels A-H, respectively. Time points without significantly higher or lower average inputs estimated during motor task than resting-state scans (paired t−test, p < 0.05, false discovery rate (FDR) corrected for multiple comparisons) presented in gray (i.e., 0). Top plots in panels A-C show onsets and durations of visual cues and motor task conditionsleft foot, left hand, right foot, right hand, and tongue movement blocks.  Figure 7: Average estimated external inputs for the motor task. Internal system parameters (i.e., A matrices) during full-length resting-state and motor scans were used to estimate the external inputs in panels A and C, respectively. Panels B and D show time points form panels A and C with significantly higher or lower average inputs estimated during motor task than phase-randomized null time series (paired t−test, p < 0.05, false discovery rate (FDR) corrected for multiple comparisons). Top plots in panels A&B show onsets and durations of visual cues and motor task conditions -left foot, left hand, right foot, right hand, and tongue movement blocks.   Figure 9: Motor task activation and their corresponding estimated external inputs. Panels A,B,E,F, and I show BOLD activation maps for the left hand, right hand, left foot, right foot, and tongue movement. The color-coded maps indicate t−test values of brain regions with significant (t−test, p < 0.05 , Bonferroni corrected for multiple comparisons over regions) coefficients of the general linear model (i.e., task condition versus baseline contrast). For this analysis we convolved task regressors with the hemodynamic response function (HRF) using spm hrf.m MATLAB script from the SPM toolbox 14 . To account for hemodynamic delay variability over brain regions, we created 7 different sets of regressors with HRF function's peaks varying from 3 to 9 seconds (relative to the onset with one second increments). For each region, we fit 7 independent GLMs using these regressor sets and used the GLM coefficients with the highest R 2 values to compute the activation maps. The brain regions identified in these panels match the prior reported task-specific BOLD activations 15;12;16 . Color-coded panels (C,G,H,J) show the t−test values of brain regions with significant (t−test, p < 0.05, FDR corrected for multiple comparisons across ROIs) loadings on input matrix B rows corresponding to PCs 2, 4, 9, and 6. (D) t−values calculated from coefficients of multiple linear regression models of estimated external inputs associated with each PC (see methods for details). The average coefficients that fail to pass the significance-level across subjects (t−test , p < 0.05, Bonferroni corrected for multiple comparisons) are depicted in gray. Since estimated inputs activity seem to lag behind task regressors, we preformed the regression analysis using 12 different sets of time-shifted task regressors (from 0 to 11 samples shift (TR= 0.72 sec)). The coefficients that fail to pass the significance-level across subjects (t−test , p < 0.05, Bonferroni corrected for multiple comparisons) are depicted in gray. Full matrix 5% zero 95% zero 50% zero Figure 12: Sparsity of the system matrix A and model's goodness-of-fit.

18
(A) We assessed the model's goodness-of-fit using Akaike information criterion (AIC) 4 for a sample task (motor) and different levels of sparsity constraints on the system matrices. Scatter plots show the mean AIC, and the error bars the standard deviations for different input matrix B dimensions and color-coded sparsity levels.
We used system matrices estimated from subjects' full-length resting-state fMRI scans as the system matrices.  Figure 13: Estimated input's sparsity and accuracy. The sparsity values (i.e., percentage of zero elements) of the estimated inputs U (A) and input matrix B (B) for a sample subject, calculated over different temporal (λ U ) and spatial (λ U ) regularization factors. Red numbers 1-3 indicate the identified regularization factors that lead to both input matrices displaying 20, 50, and 80 % sparsity, respectively. (C) The average R 2 values were calculated using multiple linear regression models of estimated external inputs across all brain regions. Inputs are estimated using the regularization parameter used in the main manuscript (λ = 0.5), which results in low sparsity U (5.91±0.4%) and B (5.91±0.4%) matrices. We also tested various lags (samples) between modeled inputs and the known task (motor) regressors to account for hemodynamic response delays. (D-F) Same results as panel C, but we identified the subject-specific spatial (λ B ) and temporal (λ U ) regularization parameters as detailed in (A-B) for 20, 50, and 80% input sparsity levels, respectively. Statistical comparisons (paired t−test, p < 0.05, Bonferroni corrected for multiple comparisons across all regions and lags) between panels C and D reveal that overall R 2 values are similar for low sparsity levels. Nevertheless, the mean R 2 values calculated from the inputs to task-related regions (e.g., somatomotor (SM) cortices) are significantly lower in panels E&F compared to panel C. c.     The error bars in all panels indicate the standard errors. We used Welch's t-test 17 to test the statistical significance of differences between identified ROIs' estimated inputs' non-stationarity across all possible pairs of resting-state networks. Comparisons that passed the significance-level pre and post FDR correction for multiple comparisons are maker with black and red '*', respectively.  Figure 20: Goodness-of-fit of LTI model using Akaike information criterion (AIC). We assessed the model's goodness-of-fit using Akaike information criterion (AIC) 4 for a sample task (motor) with (i.e., input matrix dimension =0) and without external inputs. We used system matrices estimated from subjects' full-length resting-state fMRI scans as the system matrices. Following 13 , we used AIC = m ln(RSS/m) + 2k to calculate AIC values (m number of observations, k number of parameters, and RSS residual sum of squares). Statistical comparisons reveal that AIC values are significantly (t−test, p < 0.05, Bonferroni corrected for multiple comparisons) smaller (i.e., better fit) when input matrices are included in the model and that higher-dimensional inputs further improve the fit. These results also suggest that adding external input parameters does not lead to over-fitting.